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Abstract 

Background: Here we present two new computer tools, PREMIM and EMIM, for the estimation of parental and child 
genetic effects, based on genotype data from a variety of different child-parent configurations. PREMIM allows the 
extraction of child-parent genotype data from standard-format pedigree data files, while EMIM uses the extracted 
genotype data to perform subsequent statistical analysis. The use of genotype data from the parents as well as from 
the child in question allows the estimation of complex genetic effects such as maternal genotype effects, maternal- 
foetal interactions and parent-of-origin (imprinting) effects. These effects are estimated by EMIM, incorporating 
chosen assumptions such as Hardy-Weinberg equilibrium or exchangeability of parental matings as required. 

Results: In application to simulated data, we show that the inference provided by EMIM is essentially equivalent to 
that provided by alternative (competing) software packages such as MENDEL and LEM. However, PREMIM and EMIM 
(used in combination) considerably outperform MENDEL and LEM in terms of speed and ease of execution. 

Conclusions: Together, EMIM and PREMIM provide easy-to-use command-line tools for the analysis of pedigree data, 
giving unbiased estimates of parental and child genotype relative risks. 
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Background 

Genomewide association studies have popularized the 
use of the case/control design to detect effects associ- 
ated with an individual's own genotype, however many 
diseases (especially those related to pregnancy out- 
comes) may in fact be due to more complex effects 
such as maternal genotype effects, maternal-fetal geno- 
type interactions or parent-of-origin (imprinting) effects. 
To detect such effects it is necessary to collect geno- 
type data from one or both parents of cases, in addition 
to genotyping the cases themselves. Two existing popu- 
lar approaches analyse either genetic data from affected 
offspring and their mothers (case/mother duos), along 
with an appropriate control sample [1-3], or else anal- 
yse genetic data from affected offspring and both parents 
(case/parent trios), without use of controls [4-6]. In con- 
trast, our software EMIM uses a multinomial modelling 
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approach [7] that allows the simultaneous considera- 
tion of both case/mother duos and/or case/parent trios, 
with additional child and parent genotype data (such as 
individual cases and controls, case/father duos and con- 
trol matings) included when available. The child-parent 
genotype data can be extracted from standard PLINK- 
format [8] pedigree files using our companion software 
PREMIM. 

Full details and evaluation of the multinomial mod- 
elling approach used by EMIM have been described pre- 
viously [7]. The early beta version of EMIM described 
in [7] allowed a more limited set of child-parent config- 
urations than are supported in the current version, and 
did not include the current full range of optional like- 
lihood assumptions (such as conditioning on parental 
genotypes (CPG) [6,9]). Most importantly, the com- 
panion program PREMIM was not available, limit- 
ing the ease with which EMIM could be applied to 
real data. 
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PREMIM: Pedigree file conversion 

For each SNP in turn, PREMIM performs a simple algo- 
rithm to select from each pedigree the most informative 
sub-unit of child-parent genotype data. Different pedigree 
sub-units are chosen in order of preference as listed in 
Table 1. 

There are a number of options that may be given to PRE- 
MIM. In particular, it is possible to override the default 
choice of individuals by stating a proband subject for 
certain pedigrees. These proband subjects are then cho- 
sen as cases (with parents where available). This may be 
useful to avoid possible bias when larger pedigrees have 
been ascertained on the basis of a specific affected indi- 
vidual. For larger pedigrees, it is also possible to select 
multiple case/parent trios or multiple control matings 
from each pedigree, potentially increasing the power to 
detect genetic effects. This option does have the poten- 
tial to generate bias (depending on the analysis options 
chosen [6,10]), and so results should be interpreted with 
caution, although we anticipate that most people will 
apply these types of method to small pedigrees such as 
child/parent trios, making this issue less of a concern 
in practice. (Alternative methods for dealing with larger 
pedigrees, valid under the assumptions of random mating 
and/or Hardy- Weinberg equilibrium (HWE), have been 
described by [10,11]). 

EMIM methodology 

The basic principle behind EMIM is simple: to test for the 
existence of (and estimate) genotype relative risk param- 
eters that increase (or decrease) the probability that a 
child is affected. By default, PREMIM chooses the minor 
allele to be considered as the 'risk' allele, although this 
option can be overridden if required. We denote by R\ 
{R 2 ) the factor by which an individual's disease risk is 
multiplied if they possess one (two) risk alleles at a given 

Table 1 The order of preference of pedigree sub-units 
chosen by PREMIM for each SNP 



Order Pedigree sub-unit 

1 case/parent trio 

2 case/mother duo 

3 case/father duo 

4 case 

5 case parental mating 

6 case mother 

7 case father 

8 control parental mating 

9 control/mother duo 

1 0 control/father duo 

1 1 control 



locus. We denote by Si {S 2 ) the factor by which an indi- 
vidual's disease risk is multiplied if their mother possesses 
one (two) risk alleles at that locus. We denote by I m 
(Ip) the factor by which an individual's disease risk is 
multiplied if they inherit a risk allele from their mother 
(father). Lastly, to test for mother-child interactions, we 
denote by yu the factor by which an individual's disease 
risk is multiplied if the mother carries i risk alleles and 
the child carries j risk alleles. A summary of these rel- 
ative risk parameters is shown in Table 2. A variety of 
restrictions may be made on the parameters as desired. 
For example, a multiplicative model for the effects of the 
alleles in the mother (S 2 = Sj) or child (R 2 = R\) 
may be imposed. In addition, EMIM also supports several 
alternative previously-proposed paramaterizations for the 
imprinting and interaction effects [4,5] (see [7] for more 
details). 

As an example, denote the major and minor alleles by 
1 and 2, then for a case/parent trio where the genotypes 
of the mother, father and child are 22, 11, 12, respectively, 
the penetrance is modelled as: 

P (child diseased \g m =22, gj = ll,^ c = 12) = aRiS 2 I m y 2 i 

where a is the baseline probability of disease and g m , gj 
andgc are the genotypes of the mother, father and child. 

EMIM uses a multinomial model to estimate the rela- 
tive risk parameters on the basis of observed counts of 
genotype combinations in case/parent trios as shown in 
Table 3. EMIM models the 15 different cell probabili- 
ties (corresponding to the 15 possible combinations of 



Table 2 The relative risk parameters estimable by EMIM 



Parameter Description 

Ri Child has one minor allele (child genotype effect) 

R 2 Child has two minor alleles (child genotype effect) 

51 Mother has one minor allele (maternal genotype effect) 

52 Mother has two minor alleles (maternal genotype effect) 
Y\ ! Mother has one minor allele and child has one minor 

allele (mother-child interaction effect) 
yn Mother has one minor allele and child has two minor 

alleles (mother-child interaction effect) 
)/2i Mother has two minor alleles and child has one minor 

allele (mother-child interaction effect) 
Y22 Mother has two minor alleles and child has two minor 

alleles (mother-child interaction effect) 
/ m The child receives a minor allele from the mother 

(maternally operating imprinting effect) 
l p The child receives a minor allele from the father 

(paternally operating imprinting effect) 
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Table 3 Observed genotype combinations in case/parent trios 
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a £m.g/.gc=genotypes of mother, father, child, respectively. 
b CEPG= conditional on exchangeable parental genotypes. 
C CPG= conditional on parental genotypes. 



genotypes that are consistent with Mendelian inheritance) 
in terms of the desired genotype relative risk parame- 
ters {R\,R 2 ,S\,S2,Im,Ip, Vll, Yu, Y21, Y12)- A maximum of 
7 parameters are estimable, meaning that not all of these 
parameters can be estimated simultaneously. Cordell et al. 
[12] suggested building up models from simpler to more 
complex via a series of nested hypothesis tests. Given a 
model for the penetrances in terms of the genotype rela- 
tive risk parameters, the overall likelihood for the data in 
Table 3 may be written 



15 



II ( P (Smpg/rgci Ichild diseased) } m 



<=i 



where (gmi>gfj<gci) represent the genotypes of a mother, 
father and child in genotype combination i. The probabil- 
ities P(gmi>gfi>gci Ichild diseased) may be written in terms 
of the genotype relative risk parameters of interest and six 
nuisance parameters jii — fie (corresponding to mating 
type stratification parameters as indexed in Table 3, see 
[4,7,13] for details). 

If any of the subjects are missing, we no longer have 
15 genotype counts as shown in Table 3, but instead we 
must collapse together rows to express the data in terms of 
counts of observed genotype combinations. For example, 
given data for case/mother duos (i.e. all fathers missing), 



the 7 observable counts are as shown in Table 4. The 
likelihood for the data in this table may be written 



[7{Pfe«P&|childdiseased)} mi 



=n 



mi 



V (gm>gf'gci Ichild diseased) 



where (gmpgci) represent the genotypes of a mother and 
child in (Table 4) genotype combination i. 



Table 4 Observed genotype combinations in case/mother 
duos 





Genotypes 3 


Index of 
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a £m,&=geriotypes of mother, child, respectively. 
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In practice, at any given SNP, we observe genotype 
counts (some of which may equal 0) for the following 
types of unit: case/parent trios (15 possible genotype 
combinations); parents of cases (9 possible genotype com- 
binations); case/mother duos (7 possible combinations); 
case/father duos (7 possible combinations); mothers of 
cases (3 possible combinations); fathers of cases (3 pos- 
sible combinations); cases (3 possible combinations). The 
data for each unit creates a table corresponding to a 
(possibly collapsed) version of Table 3, and the overall 
likelihood to be maximized may be constructed as the 
product of the likelihoods for the individual tables. Sim- 
ilarly, we may add in data for controls (either unaffected 
individuals or population-based controls of unknown dis- 
ease status) by further multiplying the likelihood by the 
product of the likelihoods for a similar set of control 
tables. EMIM makes use of the following types of control 
unit: parents of controls (9 possible genotype combina- 
tions); control/mother duos (7 possible combinations); 
control/father duos (7 possible combinations); controls 
(3 possible combinations). Furthermore, EMIM assumes 
that the frequencies of the different genotype combina- 
tions in control units correspond to those in the general 
population. This is equivalent to making a rare disease 
assumption, in the event that the controls are all genuinely 
unaffected. 

By default, EMIM assumes 'mating symmetry' [13] 
(equivalent to a 'conditional on exchangeable parental 
genotypes' (CEPG) [12] model), which corresponds to 
assuming that parental matings (g m = i, gf = j) are as 
likely as matings (g m = j> gf = i)- This results in the esti- 
mation of six mating type stratification parameters [13] 
Mi — M6 (see Table 3). Two more restricted (and there- 
fore potentially more powerful) models are also available 
in EMIM: 

1. A model that assumes parental allelic exchangeability 
(PAE) [2] (which corresponds in this context to 
assuming that /m = fis) 

2. A model that assumes Hardy- Weinberg equilibrium 
(HWE) and random mating, estimating a single allele 
frequency parameter in place of the six mating type 
stratification parameters. 

In addition to these more restricted models, a less 
restricted 'conditional on parental genotypes' (CPG) 
[2,9,12] model (that results in the estimation of nine mat- 
ing type stratification parameters fi\ — /ig, see Table 3) 
is also available. This model would be expected to be 
less powerful than the CEPG, PAE or HWE models, but 
should be more robust to any departure from mating 
symmetry, PAE or HWE. 

EMIM reads in genotype data from input files created by 
PREMIM. In addition, there are two other files required 



by EMIM. Firstly, a file 'emimmarkers.dat! which provides 
the minor allele frequencies for each SNP (used as starting 
values in the maximization algorithm). These can option- 
ally be estimated by PREMIM using the pedigree data, 
although other (e.g. population-based) sources for this 
information may be preferred where available. (See [7] for 
an investigation of EMIM's sensitivity to misspecification 
of the assumed or estimated allele frequencies). The other 
required file is a parameter file 'emimparams.dat; describ- 
ing the type of analysis that EMIM should perform, which 
parameters to estimate, and which assumptions (such as 
HWE or PAE) should be made. 

Implementation 

PREMIM is written in C++ and for a binary pedigree 
file with 913 pedigrees, 1730 subjects and 45323 SNPs 
it takes 19 seconds to process on a Six-Core AMD 
Opteron™ Processor with 2.6 GHz CPUs. EMIM is writ- 
ten in FORTRAN 77 and makes use of a subroutine 
MAXFUN, originally written as part of the S.A.G.E. [14] 
package. For these same data (pre-processed by PREMIM) 
on the same machine, EMIM takes 1 minute and 22 sec- 
onds to perform an analysis to test for multiplicative child 
genotype effects, assuming HWE. For larger data sets, 
EMIM and PREMIM have options that allow easy parallel 
processing by dividing the SNPs to analyse into different 
batches. 

Results and discussion 

Example analysis using simulated data 

We used the program SimPed [15] to generate a sin- 
gle replicate of simulated data for 200 case/parent trios, 
200 case/mother duos, 200 control/mother duos and 1000 
unrelated controls at 8000 SNPs across a chromosome. 
We used a simplified linkage disequilibrium (LD) model 
that assumed LD operated in haplotype blocks, each of 
length 8 SNPs. We simulated child genotype effects {Ri = 
1.5 and R2 = 2.25) at SNP 76 and maternal genotype 
effects (Si = 2 and S 2 = 3) at SNP 6004. We then used 
EMIM to test for maternal effects, with and without allow- 
ing for child genotype effects (Figure 1C, Figure 1A), and 
to test for child genotype effects, with and without allow- 
ing for maternal effects (Figure ID, Figure IB). In all four 
analyses, we see a strong signal at the correct location, 
with the high significance probably due to the relatively 
large effect sizes assumed. 

A tutorial for this example (with a listing of the required 
commands) is available on the PREMIM and EMIM 
website: http://www.staff.ncl.ac.uk/richard.howey/emim/ 
example.html 

Comparison of HWE, PAE, CEPG and CPG likelihoods 

The power to detect genetic effects can vary depend- 
ing on the assumptions made. As a demonstration, we 
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Maternal Genetic Effects Analysis Child Genetic Effects Analysis 
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Maternal Genetic Effects Analysis Child Genetic Effects Analysis 

allowing for Child Genetic Effects allowing for Maternal Genetic Effects 
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Figure 1 Genetic Effects. Plot of the - log 10 p-values for each SNP given by EMIM to detect: (A) child genetic effects; (B) maternal genetic effects; 
(C) maternal genetic effects whilst allowing for child effects; and (D) child genetic effects whilst allowing for maternal effects. 



simulated 1000 replicates of data at a single SNP for a 
sample consisting of 50 of each of the following units: 
case/parent trios, case/mother duos, case/father duos, 
control matings, control/mother duos and control/father 
duos. We assumed either a child genotype effect {R2 = 
2), a maternal genotype effect (S2 = 2), or a maternal 
imprinting effect (I m = 1.8). PREMIM and EMIM were 
used to estimate the parameters R\, R2, Si, S2 and I m 
for each different likelihood assumption and for each set 
of simulated data. Figure 2(A-C) shows that the power 
to detect the relevant effect decreases as one makes 
less restrictive (but potentially more robust) assump- 
tions, while Figure 2(D-E) shows that unbiased param- 
eter estimation is achieved using the most restrictive 
assumption (HWE) (provided that assumption is correct). 
Similar unbiased parameter estimation is achieved for the 
other likelihood assumptions, when they are met (data 
not shown). 



Effect of missing data on power 

As a demonstration of the effect that missing data has 
on the power, we performed analyses at a single SNP 
using simulated data (10,000 replicates, each replicate 
consisting of 100 case/parent trios and 100 control/parent 
trios) and assuming a range of probabilities of missing 
genotype data. We assumed a maternal genotype effect 
(Si = 1.5, 52 = 2.25). The expected proportion of pedi- 
gree units of different types remaining in the analysis are 
shown in Figure 3A and Figure 3B respectively. The trios 
are all present when there is no missing data, but the 
expected proportion quickly decreases when the proba- 
bility of missing genotype data is increased. The expected 
proportion of the other pedigree types then increases, but 
subsequently decreases and converges to 0 as the proba- 
bility of missing data approaches 1. The power to detect 
the maternal genetic effects (when correctly modelled) 
also decreases with increasing proportion of missing data 
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Figure 2 Comparison of Likelihood Assumptions in EMIM. Results from simulated data. (A-C) The power of likelihood ratio tests to achieve 
significance levels (p-values) of 0.001, 0.01 and 0.05 for different simulated effects and likelihood assumptions: HWE - Hardy-Weinberg Equilibrium; 
CEPG - Conditional on Exchangeable Parental Genotypes; CPG - Conditional on Parental Genotypes; PAE - Parental Allelic Exchangeability. (D-F) Box 
plots on a log-scale of parameter estimates for fti , R2, Si , S2 and l m , assuming HWE. Dotted lines show the true parameter values. 



(Figure 3C). An advantage of the EMIM framework is 
that it makes efficient use of data from all possible avail- 
able individuals, allowing one to recover information even 
from incompletely genotyped trios. 

Buyske [16] pointed out that maternal genotype effects 
can masquerade as child genotype effects, if analysed as 
such. If the maternal genetic effects are incorrectly mod- 
elled as child genetic effects (Figure 3D), we find limited 
power to detect these effects even when there is no miss- 
ing data. Increasing the proportion of missing data has 
little effect on the power of this analysis, until the prob- 
ability of missing genotype data becomes very large (e.g. 
more than 80%). 

Comparison with MENDEL 

Several other software packages exist that allow test- 
ing and estimation of genotype relative risk parameters 
similar to those tested in EMIM. One such package is 
MENDEL [17]. MENDEL most easily allows the estima- 
tion and testing of mother-child interaction effects via 
the maternal-fetal genotype incompatibility (MFG) test 
[5], although a "Generalized Risk" analysis that allows 
implementation of more complex user-defined parama- 
terizations (through the imposition of various parameter 
restrictions) is also available. 

We used computer simulations (500 replicates each with 
200 case parent trios) to compare the performance of 



MENDEL and EMIM under three different comparable 
models: 

1. Model 1. This model has been used to test for RhD 
incompatibility [18] and estimates the relative risk 
corresponding to the mother having no risk alleles 
and the child one risk allele. MENDEL was used to 
estimate this one relative risk parameter by setting 
the sex-specific effects (parameters MFG Jvl and 
MFG_F in MENDEL) to be equal. The equivalent 
single parameter yni (corresponding to the 
parametrization of [5,18]) was estimated in EMIM. 
The data were simulated assuming yoi = 2. 

2. Model 2. This model has been used to test for non- 
inherited maternal antigens (NIMA) on rheumatoid 
arthritis (RA) [19] and consists of three parameters 
(ignoring sex-specific MFG testing): a relative risk 
parameter (yio) for MFG when the mother has one 
risk allele and the child has no risk alleles, and two 
parameters for child effects when the child has one 
or two risk alleles. In order to compare EMIM with 
MENDEL under this model, we used PREMIM to 
reassign which allele should be considered as the risk 
allele by EMIM. A model equivalent to MENDEL's 
NIMA model can then be fit in EMIM by estimating 
parameters (with respect to the reassigned allele) Ri, 
i?2 and Yi2- Data were simulated assuming an MFG 
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Figure 3 Effect of Missing Genotype Data. Plots showing the effect as the probability of missing genotype data is increased for data simulated 
with a maternal genetic effect. A: The expected proportions of different types of pedigree unit output by PREMIM from a set of case/parent trios, for 
different probabilities of missing genotypes. B:The expected proportions of different types of pedigree unit output by PREMIM from a set of control 
trios, for different probabilities of missing genotypes. C: Power of EMIM to detect maternal genetic effects (by estimating parameters Si and S2). D: 
Power of EMIM to detect maternal genetic effects masquerading as child genetic effects (by estimating parameters R\ and R2l- 



effect yio = 2. The power to detect the the MFG 
effect in either MENDEL or EMIM was calculated by 
considering twice the difference between the 
negative log likelihood from a model that includes all 
three parameters {Ri, R2 and the MFG parameter) 
and that from a model where the MFG parameter 
has been removed. 
3. Model 3. This MENDEL model is a general MFG 
test consisting of one relative risk parameter for each 
of the 7 mother/child genotype combinations. The 
relative risk parameter denoted LL00 in the 
MENDEL documentation (corresponding to the 
situation where the mother and child have no risk 
alleles) was set to 1 and not estimated to avoid 
over-parametrization. The other 6 parameters, U_22, 
U_21, LL12, UJ.1, LL10, UD1, were estimated. The 6 
parameters estimated by EMIM were R\, R2, Si, S2, 
Yn and Y22- These parameters are not indvidually 
equivalent to the 6 MENDEL parameters, but the 
models as a whole can be shown to be equivalent. 
Data for this comparison were simulated assuming 
Rl = S\ = yn = Y22 = 1-5 and R 2 = S2 = 2.25. 



Figures 4 and 5 show a comparison of the null model 
(no estimated parameters) and the full model log like- 
lihoods from EMIM and MENDEL, for Models 1 and 
2 respectively. EMIM was set to assume HWE (since 
MENDEL assumes HWE by default). We see that the null 
and full model log likelihoods from the two programs 
are are very similar (Figures 4(A), 4(B), 5(A), 5(B)), 
resulting in approximately equal powers and parameter 
estimates (Figures 4(C), 4(D), 5(C), 5(D)). For Model 3, 
EMIM and MENDEL similarly gave approximately equal 
log likelihoods and powers (results not shown). 

One difference between EMIM and MENDEL was the 
time taken to perform the analysis, with EMIM perform- 
ing considerably quicker than MENDEL. For example, the 
time to run model 3 (with 200 case/parent trios) showed 
that PREMIM and EMIM combined took 0.0257 sec- 
onds and MENDEL took 6.45 seconds (averaged over 300 
runs). This shows that PREMIM and EMIM combined 
were approximately 250 times faster than MENDEL in 
this example. The same analysis with 400 case/parent trios 
gave times of 0.0302 seconds for PREMIM and EMIM 
combined and 14.3 seconds for MENDEL (averaged over 
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Figure 4 Mother-Child Interaction Effects Comparison with MENDEL, RHD. Plots showing the comparison of EMIM and MENDEL - "option 26, 
Model 1: RHD" using simulated data. A: Plot of the null model log likelihood values calculated using EMIM and MENDEL. B: Plot of the full 
(alternative) log likelihood values calculated using EMIM and MENDEL. CThe power to detect a genetic effect for p-values of 0.05, 0.01 and 0.001. D: 
Plot of the MFG parameter estimates calculated using EMIM and MENDEL 



300 runs), showing PREMIM and EMIM to be approxi- 
mately 472 times faster then MENDEL. A possible reason 
for the difference in running times is the fact that the 
extended MFG model [11] implemented in MENDEL 
is a slightly more complicated model than the par- 
ent/offspring trio model implemented in EMIM (thus 
providing MENDEL with the ability to analyse larger 
pedigrees). 

Comparison with LEM 

Another program with the capability to analyse complex 
genetic effects (most notably mother/child/imprinting 
effects) is LEM [20]. LEM is a Windows-based log-linear 
modelling program designed primarily to be used via a 
graphical user interface, although it is possible to run 
it from the DOS command line, in order to implement 
scripts that allow the analysis of large numbers of loci 
or replicates. LEM takes an input parameter file which 



defines the model, the parameters to be estimated and 
the name of the input data file. We created input param- 
eter and data files based on examples provided by the 
authors of LEM [20] for case/parent trios and by [21] for 
case/mother and control/mother duos. 

1. Case/parent trios. SimPed [15] was used to simulate 
a single replicate of data at 8000 SNPs across a 
chromosome for 4000 case/parent trios. Child effects 
(Ri = 1.5, 7? 2 = 2.25) were simulated at SNP number 
1004 and maternal effects (Si = 2, S2 = 3) were 
simulated at SNP number 6004. In both EMIM and 
LEM we tested for maternal effects while allowing for 
child and maternal imprinting effects (i.e. we 
compared an alternative 5-parameter model {R\, R2, 
Si, S2, I m ) with a null 3-parameter model (i?i, R2, 
I m )). We calculated the p-value for LEM on the basis 
of the reported log likelihoods by using the Wald 
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Figure 5 Mother-Child Interaction Effects Comparison with MENDEL, NIMA. Plots showing the comparison of EMIM and MENDEL - "option 26, 
Model 2: NIMA" using simulated data. A: Plot of the null model log likelihood values (with child effects fitted, but no MFG interaction effect) 
calculated using EMIM and MENDEL B: Plot of the full (alternative) log likelihood values (fitting both child effects and MFG effect) calculated using 
EMIM and MENDEL C: The power to detect the MFG effect for p-values of 0.05, 0.01 and 0.001 . D: Plot of the MFG parameter estimates calculated 
using EMIM and MENDEL. 



statistic as a / 2 value with 2 degrees of freedom. (The 
p-value reported by LEM was not suitable as it is only 
given to 3 decimal places, which was insufficient for 
SNPs with p-values less than 10 -3 ). 
2. Case/mother duos and control/ mother duos. 
Again, data were simulated at 8000 SNPs but this 
time for 2000 case/mother duos and 2000 
control/mother duos. Child effects (i?i = 1.5, 
7?2 = 2.25) were simulated at SNP number 1000 and 
maternal effects (Si = 2, S2 = 3) were simulated at 
SNP number 6004. In both EMIM and LEM we 
tested for maternal and child effects i.e. we compared 
a null model with no fitted parameters to an 
alternative model with parameters (i?i, R2, Si, S2). 

A comparison of EMIM versus LEM for the 
case/mother and control/mother duos is shown in 
Figure 6. Figure 6(A) and 6(B) show that the p-values 



across the chromosome appear to be indistinguishable, 
and Figure 6(G) shows that the p-values for each SNP 
from the two programs are indeed approximately equal. 
Figures 6(C) and 6(E) show that the estimates of Ry and 
Si are approximately equal and Figures 6(D) and 6(F) 
show that i?2 and S2 are also approximately equal, but 
with more variability. 

Figure 7 shows the same plots for the case/parent trios, 
but with the addition of estimates for the extra parameter 
I m . We see that the p-values and parameter estimates pro- 
vided by the two programs are virtually indistinguishable. 

These results indicate that the inference provided by 
LEM and EMIM is essentially identical. This is as expected 
given the mathematical equivalence [7,22] between the 
multinomial model fit by EMIM and the log linear model 
fit by LEM. The main difference between the programs is 
the time taken to perform the analysis, with EMIM per- 
forming considerably quicker than LEM. For example, the 
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time taken to run the case/mother and control/mother 
duos analysis across 8000 SNPs in PREMIM/EMIM was 
1 minute 21 seconds on a Linux machine (6-Core AMD 
Opteron™ Processor with 2.6 GHz CPUs) or 2 minutes 
4 seconds on Windows (using a 2-core Intel™ Processor 
with 2.93 GHz CPUs), whereas the same analysis in LEM 
took 16 hours, 52 minutes and 8 seconds on Windows (via 
the DOS command line). The difference in speed between 
the two programs for the case/parent trios analysis was 
not as extreme, with PREMIM/EMIM taking 3 minutes 7 
seconds on Linux or 4 minutes 49 seconds on Windows, 
versus LEM's time of 63 minutes 58 seconds on Windows. 
The improved speed for the LEM trios analysis was most 
likely due to the fact that it took fewer steps than the duos 
analysis during the likelihood maximization process (pos- 
sibly on account of the fact that the example parameter file 
we were using requested the program to switch to using 
a Newton-Raphson algorithm following 10 iterations of 
an EM algorithm). It is possible that differences between 
maximization algorithms and convergence criteria could 
account for some of the differences in speed between 
PREMIM/EMIM and LEM; we found it difficult to deter- 
mine how to obtain precise control over such factors in 
LEM and were forced to use input files that very closely 
matched the examples provided by [20,21]. Another fac- 
tor influencing speed could be the fact that LEM does not 
(as far as we are aware) allow the input of multiple SNPs 
simultaneously, meaning that we had to create and read 
into LEM a separate input file for each SNP analysed. 

Conclusions 

Here we have presented two new computer tools, PRE- 
MIM and EMIM, for the estimation of parental and 
child genetic effects, based on genotype data from a vari- 
ety of different child-parent configurations. The current 
version of EMIM improves upon the early beta version 
described in [7] by allowing a larger set of possible child- 
parent configurations, a larger range of optional likelihood 
assumptions, and by the development of the companion 
program, PREMIM, for generating the required input files 
from standard PLINK-format files, considerably improv- 
ing the ease with which EMIM can be applied to 
real data. 

In application to simulated data, we have shown that 
the inference provided by EMIM is essentially equiva- 
lent to that provided by alternative (competing) software 
packages such as MENDEL and LEM. EMIM does have 
the advantage of allowing easy implementation of a wider 
class of models than are most easily implemented in 
MENDEL and LEM, although the expert MENDEL/LEM 
user could probably achieve the same model flexibility 
through judicious choice of parameter restrictions. How- 
ever, PREMIM and EMIM (used in combination) consid- 
erably outperform MENDEL and LEM in terms of speed 



of execution, an advantage that is likely to be all the more 
important when applying these approaches to large-scale 
data sets such as those generated in genome-wide associa- 
tion studies. To allow further increases in speed, PREMIM 
and EMIM also have the advantage of allowing easy paral- 
lel processing (e.g. on a computer cluster) by dividing the 
SNPs to analyse into different batches. 

Limitations of PREMIM and EMIM include the fact 
that larger pedigrees are divided into case/parent or con- 
trol/parent trios (or smaller sub-units) prior to analysis, 
and the fact that SNPs are analysed one at a time, without 
borrowing information from neighbouring markers (e.g. 
on the basis of regional linkage disequilibrium patterns). 
Methods for dealing with larger pedigrees, valid under the 
assumptions of random mating and/or Hardy- Weinberg 
equilibrium (HWE), have been described by [10,11], while 
[23] present an approach that models haplotypes rather 
than individual SNPs, allowing the borrowing of informa- 
tion (including information on parent-of-origin or missing 
genotype data) across neighbouring SNPs. Both of these 
features would be valuable additions to future releases of 
our software. Nevertheless, the current versions of EMIM 
and PREMIM provide easy-to-use command-line tools 
for the analysis of pedigree data, allowing testing and 
estimation of a variety of parental and child genotype 
relative risks. 
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